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A Survey of Collectives 

Kagan Turner and David Wolpert 

ABSTRACT Due to the increasing sophistication and miniaturization 
of computational components, complex, distributed systems of interact- 
ing agents are becoming ubiquitous. Such systems, where each agent aims 
to optimize its own performance, but where there is a well-defined set of 
system- level performance criteria, are called collectives. The fundamen- 
tal problem in analyzing/designing such systems is in determining how the 
combined actions of self-interested agents leads to “coordinated” behavior 
on a. large scale. Examples of artificial systems which exhibit such behavior 
include packet routing across a data network, control of an array of commu- 
nication satellites, coordination of multiple deployables, and dynamic. job 
scheduling across a distributed computer grid. Examples of natural systems 
include ecosystems, economies, and the organelles within a living cell. 

No current scientific discipline provides a thorough understanding of the 
relation between the structure of collectives and how well the}" meet their 
overall performance criteria. Although still very young, research on collec- 
tives has resulted in successes both in understanding and designing such 
systems. It is expected that as it matures and draws upon other disciplines 
related to collectives, this field will greatly expand the range of computa- 
tionally addressable tasks. Moreover, in addition to drawing on them, such 
a full}" developed field of collective intelligence may provide insight into 
already established scientific fields, such as mechanism design, economics, 
game theory, and population biology. This chapter provides a survey to the 
emerging science of collectives. 


1.1 Just What is a “Collective”? 

As computing power increases, becomes cheaper and is packed into smaller 
and smaller units, a new computational paradigm, one based on adaptive 
distributed computing is emerging. Whether used for control or optimiza- 
tion of complex engineered systems, or the analysis of natural systems, 
this new paradigm offers new and exciting solutions to the problems of 
the twenty first century. However, before the full strength of this powerful 
computational paradigm can be harnessed, some fundamental issues need 
to be addressed. 

In this chapter we provide a survey of approaches to large distributed 
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systems called collectives. A collective is a large system of agents 1 , where 
each agent has a private utility function it is trying to optimize adap- 
tive utility-maximizing algorithms, called ’’agents”, along a world util- 
ity function that measures the full system’s performance 2 . Though system 
that meet this definition have been investigated in various field, no current 
discipline provides a general framework with which to design and stud}' 
collectives. 

Mechanism design, a subfield of economics, is perhaps the closest field 
addressing the “design” question posed in a collective [82, 87]. Mechanism 
design aims at finding the right “market mechanism” that will induce a 
set of agents to act in a manner specified by the system designer. Though 
this seems like a close match for what we expect a collective to achieve, 
conventional mechanism design is specifically designed for human agents 
and therefore is not meant to deal with arbitral*}' private and world utilities. 
Also, some issues essential to collectives (e.g., learning in agents) do not 
play a central role in it (see Section 1.2.3 for details). 

Game theory, on the other hand, provides a good basis for the analysis 
of collectives [11, 19, 30, 87]. However, the principal focus of game the- 
ory is on the equilibrium behavior of fully rational agents. Unfortunately, 
large adaptive real world systems seldom operate at (or near) equilibrium, 
and due to the uncertainty in the agents’ decision making, are rarely com- 
posed of fully rational agents. Furthermore practical issues fundamental to 
collectives (e.g., scaling) are not generally addressed in game theory (see 
Section 1.2.2 for details). 

In the computer science domain, Reinforcement Learning (RL) [123, 
221] and in particular, reinforcement learning in a Multi-Agent System 
(MAS) [53, 56, 112, 192] addresses the question of how in a large dynamic 
environment, one can learn to take actions to optimize a reward function. 
In general however, RL in a MAS does not address how the reward func- 
tions have to be crafted so that agents collectively act to optimize a world 
utility is not addressed. As a consequence, in traditional RL approach to 
multi-agent systems, each agent receives the full world reward as its private 
utility. Though this “solution” bypasses the incentive compatibility issue, it 
ignores the scalability issue. As such, though such systems work well where 
there are a small number of agents [56], they do not scale to system with 
hundreds or thousands of agents (see Section 1.2.1 for details). 

Though mechanism design, game theory and reinforcement learning in 
multi- agent systems provide some of the ingredients required for a full 
fledged field of collectives, they fall short of providing a suitable starting 
point for the development of such a field. Furthermore, merging concepts 


1 we use the term “agent” to refer to the components of the system, though the various 
fields surveyed use different terminology* (i.e., player in game theory) 

2 The world utility can be provided as part of the specifications of the system, or 
“constructed” by the designer, as discussed below. 
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from one of these fields to another is in general cumbersome due to the 
various assumptions - rarely explicit - deeply rooted in each field. What 
is needed for the field of collectives to develop and mature is a common 
language describing the various properties of collectives, a set of desirable 
properties, a theoretical framework, and a set of problems that wall provide 
good testing grounds for new r ideas in this field. 


1.1.1 Distinguishing Characteristics of Collectives 

Collectives can be characterized through many different distinguishing char- 
acteristics. In design problems there are many decisions (either explicit or 
implicit) that greatly affect the type of collective wdth w^hich one ends up. 
Similarly, there are many decisions that determine what types of problems 
can be analyzed as collectives. 

Since the chapters in this volume will focus on various design and anafrsis 
aspects of collectives, we briefly synopsize some distinguishing character- 
istics of collectives. These include the presence/ absence of a well-defined 
world utility function; the forw^ar d/in verse approach; the presence/need 
for centralized control and/or communications; the presence/absence of a 
model; and scalability /robustness/ adaptivity. 

World Utility Function 

Having a 'well-defined world utility function that concerns the behavior of 
the entire distributed system is crucial in the study of collectives. Such 
a world utility function provides an objective quantification of how well 
the system is performing. In that light, in a collective, we are not con- 
cerned with an unquantifiable “emergent” behavior of the system. Rather 
we are interested in how the system meets the pre-specified world utility (of 
course, nothing precludes the world utility from depending on the emergent 
behavior of the system, assuming such behavior can be quantified). 

The most natural type of w^orld utility is a provided utility, one that 
comes as part of the problem definition and specifies the overall perfor- 
mance criteria that the collectives needs to meet. Examples of such w r orld 
utilities include total throughput in a data network, total scientific informa- 
tion gathered by a team of deployables, total information downloaded by 
a constellation of satellites, the valuation of a company, or the percentage 
of available free energy exploited by an ecosystem. 

However, the lack of a provided world utility does not preclude a collective- 
based approach to a problem. In such a case, assuming the agents have some 
utility functions associated wdth them, a wrorld utility can be constructed 
(e.g., construct a social welfare function in economics). Examples of such 
world utilities include sum of agent utilities, sum of agent utilities and 
variances, and the utility of the worst-off agent. Note that optimizing each 
of these constructed world utilities w^ould result in different system behav- 
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ior. What is particularly interesting in such problems is the relationship 
between the agents’ initial utility functions and the utility^ functions that 
they ought to pursue in order to optimize the constructed world utility 
function. 

Forward (Analysis) vs. Inverse (Design) Problem 

Whether it has a provided or constructed world utility, a collective can be 
approached from two very different perspectives. Analysis or the forward 
problem, and design or the inverse problem. 

The forward problem focuses on how the localized attributes of a col- 
lective induce global behavior and thereb} r determine system performance. 
Generally, this problem arises in the study of alreadv^ existing complex sys- 
tems, and is most naturally applicable to biological s} r stems, or s} r stems 
that can be viewed as such. Examples of such systems include ecosystems, 
or a living cell, where in each case, the local interactions (species and or- 
ganelles, respectively) lead to complex emergent behavior at a large scale. 

Engineered systems such as processes (e.g., the space shuttle maintenance 
and refurbishment process) or (economic) organizations can also be viewed 
as forward problems in collectives. In those cases, the analysis approach 
can lead to predictive models and detect interactions among components 
of the system that may lead to breakdowns (e.g., determining w-hether a 
component considered “safe” can cause a critical malfunction when it is 
put in interaction with another “safe” component). 

The inverse problem on the other hand, arises when w T e wish to design 
a system to induce behavior winch optimizes the world utility. Here, the 
designer either has the freedom to assign the private utility functions of 
the agents (e.g., determine wiiat each satellite or router should be dping) 
or needs to design incentives that will be added to the pre-existing private 
utilities of the agents (e.g., economics, wiiere agents are humans). In either 
case though, the focus is on guiding tow^ards states wffiere the wwld utility 
is high. 

Centralized communication or control 

Though not in the formal definition of a collective, many collectives are de- 
centralized systems. With few' exceptions, it will be difficult, if not impos- 
sible, to have centralized control in a collective, not only because reaching 
each agent may be problematic, but more fundamentally, because in many 
cases a centralized algorithm may not be able to determine wiiat each agent 
should do. 

Similarly, though some amount of global communication (e.g., broad- 
casting) may be possible, in general there will be little to no centralized 
communication, wffiere a small subset of agents not only communicates 
with all the other agents, but communicates differently with each one of 
those other agents. Establishing the amount of allowed (or possible) cen- 
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tralized communication and control will be one of the fundamental issues 
in a collective. 

Model-Based vs. Model-Free 

Another important characteristic of a collective is the presence/ absence of a 
model describing the dynamics of the system. A model-based approaches 
consist of: 

1. Constructing a detailed model of the dynamics governing the collec- 
tive; 

2. Learning the function which maps the parameters of the model to the 
resulting dynamics of the system (in practice, this step can involve 
significant hand-tuning); and 

3. a) Drawing conclusions about this system based on the model (for- 
ward problem); 

b) Determining parameters of the model that will yield desired be- 
havior (inverse problem). 

A fundamentally different approach however, is to dispense with build- 
ing a model altogether, on the grounds that large, complex systems are 
generally noisy, fault} 7 , and often operate in n on-stationary environments. 
In such cases, coming up with a detailed model that captures the dynamics 
in an accurate manner is often extraordinarily difficult. 

A model-free approach hand relies on the agents “reacting” to the 
environment (e.g., through a reinforcement learning mechanism). As such 
they avoid explicitly modeling the system in which they operate, and in 
particular, avoid the potentially infinite regress when one agent tries to 
model another’s behavior and that other agent is itself modeling the first 
agent’s behavior. 

The model-based vs. model-free choice has significant consequences in 
how the system can adapt, scale up, and how lessons learned from one 
domain can map to another one. A model-based approach may be the choice 
for domains where the designers can develop detailed models and have a 
moderate degree of control over the environment. However, in domains 
where detailed models are not available, or where there is reason to believe 
changes in the environment can lead to significant deviations from any 
model, a model-free approach is preferable. 

Scalability 

One of the implicit defining properties of a collective is that it is a large 
system of distributed agents. As such, scalability is a fundamental property 
of any approach that aims to study/design a collective. Though this does 
not preclude extending extant analysis/design tools appropriate for single 
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(or small) systems to large systems, it does suggest that in most instances, 
new ways of approaching the problem are likely to be more appropriate 
(e.g., a game theoretic equilibrium analysis for a million nano-devices is 
unlikely to provide useful insight into the behavior of the collective.) 


Adaptivity 

Though scalability does not require that the system be adaptive, it pro- 
vides a strong impetus to move in that direction. Any approach that allows 
adaptivity, or learning, will have a significant advantage over one that does 
not, simply because the larger a system, the more difficult it will be to 
know a priori all the “right moves" for each agent. 

Furthermore, the need for adaptivity extends beyond each agent in the 
collective. Indeed the structure of the collective itself (e.g., the communica- 
tion channels among the agents, the agents’ utility function) in many cases 
is adaptive. In natural collectives this system-level adaptivit}^ is generally 
implicit (e.g., the interaction among species in an ecosystem or the rela- 
tionship among employees in a company), whereas in artificial systems it 
must be built in. 

Robustness 

Another desirable property of a collective is that it be robust, i.e., that the 
collective not require that many details (e.g., parameters) be set just right 
it to perform well. Clearly, as the number of agents in a collective goes 
up, it will become increasingly difficult to ensure failure-free operation of 
each agent. It is therefore imperative that the structure of the collective be 
insensitive to the specific operation of a small subset of its agents (e.g., in 
general the poor performance of one employee does not bring a company 
down, or the demise of a single individual does not result in the extinction 
of a species). 


1.1.2 Canonical Experimental Domains 

The previous section provided a list, of distinguising characteristics of collec- 
tives. The usefulness of these characteristics is in their providing a common 
language for a field of collectives. For example, a particular instance of data 
routing in a telecommunications network can be characterized as “a model- 
free inverse problem involving a provided world utility function where there 
is limited broadcast information but no form of global control.” 

We now provide examples of both engineered and natural systems which 
are ideally suited to be studied as collectives. For each, we provide one 
or more world utility functions, discuss how it can be approached (e.g., 
forward/inverse problem), and wffiat assumptions (e.g., is it model-based?) 
and restrictions (e.g., is global communication possible?) are present. 
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• Control system for constellations of communication satellites : A can- 
didate world utility for this problem is a measure of (potential^ im- 
portance weighted) information transferred. It is an example of an 
inverse problem, where centralized communication or control is likely 
to be difficult or impossible due to physical constraints (e.g., time 
lag), and where a model of the data flow is likely to be inadequate. 

• Control system for constellations of planetary exploration vehicles: A 
potential world utility for such a problem is a measure of the quality 
of scientific data collected. Though this can be viewed as an example 
of an inverse design problem (as with constellations of satellites), it 
can also be approached as a forvrard problem, particularly* if the vehi- 
cles have have characteristics which cannot be altered (e.g., vehicles 
are built and we are confronted w r ith the problem of predicting the 
behavior of the collective). 

• Control system for routing over a communication network: An obvi- 
ous world utility for this problem is the total throughput of the com- 
munication network. Centralized communication or control in such a 
network is all but impossible, but some amount of broadcast infor- 
mation can filter its way to all the agents at regular time intervals. 
As an inverse problem, one would be required to design the private 
utility functions of the agents. As a forward problem on an already 
functioning network, one could determine the stress points of the sys- 
tem, or the states which would cause the largest congestions in the 
network. 

• Air Space Management: Given a problem specification where there is 
some leeway in modifying the course and speed of airplanes, a poten- 
tial world utility is minimizing delays at airports. The system design- 
ers are faced with the inverse problem of determining the incentives 
for the agents (whether they be pilots or air traffic controllers) so that 
their behavior (e.g., arrival times to the airport’s airspace) optimizes 
the world utility. This is a case where though global communication 
is possible, global control is not. 

• Managing a power grid: A world utility based on the efficiency of the 
grid would be a good starting point for an inverse problem, involving 
some degree of centralized communication or control. An alternative 
world utility may be robustness. In such a case a forward problem 
would involve finding how quickly the system responds to certain 
disturbances, and how the system interactions can be modified so as 
to limit the propagation of those disturbances. 

• Job scheduling across a computational grid: A candidate world utility 
is the efficiency in processing the jobs entering the system. This prob- 
lem is very similar to managing a power grid, but provides a glimpse 
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at the inverse problem: how should one set the rewards of the coni' 
putational nodes so that they process the most number of jobs collec- 
tively? A model-free solution involving learners at the computational 
nodes would be based on limited global communication. 

• Control of the elements of a nanocomputer: A potential world utility 
for this problem is how well certain computations are carried out 
by the nanocomputer. In an inverse problem, one would focus on 
determining the structure of the adaptive system which would lead 
the agents to perform the desired computations. A particular instance 
of an inverse problem of this nature is the selection of subsets of faulty 
devices, where the world utility is total aggregate error of the selected 
devices. 

• Study of a protocell : A potential world utility for this problem is the 
length of time the protocell maintains its functionality. As a forward 
problem, this problem consists of modeling the behavior of the S3^stem 
based on the organelles and their functions/interactions. With more 
leeway in the definition of the functions the organelles perform, one 
can view this as an interesting inverse problem: What should the 
organelles try to achieve to maintain the structure and functionality 
of the protocell? 

• Study/Design of an ecosystem: One world utility for the study of an 
ecosystem is the total bio-mass of the ecosystem. In a model-based 
forward problem, one can study the effect of various interactions on 
the world utility. Alternatively, as an inverse problem, one can in- 
vestigate how to design an ecosystem which will provide the best 
sustainable bio-diversity for a given mass (e.g., for a long term space 
mission) . 

• Design of incentives in a Company: A “simple” world world utility 
for a company is the valuation of the company (share price times 
the number of outstanding shares). The inverse problem consists of 
determining how to design incentives that will induce the companies 
valuation to go up (e.g., what set of salaries /benefits /stock options 
will induce the empktyees to take actions that will benefit the corpo- 
ration). 

All of these problem share the property that they are inherently dis- 
tributed systems where the interactions among the agents leads to complex 
behavior. Though each one can be approached by conventional methods, 
how those methods need to be modified to suit the particular application 
will be different in each case. The aim of this chapter is to both accentuate 
the similarities among these problems and also to highlight the need for a 
general approach which would address all these problems within the same 
framework. 
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1.2 Review of Literature Related to Collectives 

There are many approaches to analyzing and designing collectives that do 
not exactly meet the needs of a “held of collectives” yet provide some part 
of the equation. The rest of this section consists of brief presentations of 
some of these approaches, and in particular characterizes them in terms of 
the properties of collectives discussed above. 


1.2.1 AI and Machine Learning 

There is an extensive body of work in AI and machine learning that is 
related to the design of collectives. Indeed, one of the most famous specu- 
lative works in the field can be viewed as an argument that AI should be 
approached as a design of collectives problem [163]. Below, we discuss some 
topics relevant to collectives from this domain. 

Distributed Artificial Intelligence 

The field of Distributed Artificial Intelligence (DAI) has arisen as more and 
more traditional Artificial Intelligence (AI) tasks have migrated toward par- 
allel implementation. The most direct approach to such implementations 
is to directly parallelize AI production systems or the underlying program- 
ming languages [79, 189]. An alternative and more challenging approach 
is to use distributed computing, where not only are the individual reason- 
ing, planning and scheduling AI tasks parallelized, but there are different 
modules with different such tasks, concurrently working toward a common 
goal [118, 119, 143]. 

In a DAI, one needs to ensure that the task has been modularized in a 
way that improves efficiency. Unfortunately, this usually requires a central 
controller whose purpose is to allocate tasks and process the associated 
results. Moreover, designing that controller in a traditional AI fashion of- 
ten results in brittle solutions. Accordingly, recently there has been a move 
toward both more autonomous modules and fewer restrictions on the in- 
teractions among the modules [194]. 

Despite this evolution, DAI maintains the traditional AI concern with 
a pre-fixed set of particular aspects of intelligent behavior (e.g. reasoning, 
understanding, learning etc.) rather than on their cumulative character. As 
the idea that intelligence may have more to do with the interaction among 
components started to take shape [41, 42], focus shifted to concepts ( e.g ., 
multi- agent systems) that better incorporated that idea [121]. 

Multi- Agent Systems 

The field of Multi- Agent Systems (MAS) is concerned with the interactions 
among the members of such a set of agents [40, 92, 121, 204, 222], as 
well as the inner workings of each agent in such a set ( e.g ., their learning 
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algorithms) [36, 37, 38]. As in computational ecologies and computational 
markets (see below), a well-designed MAS is one that achieves a global 
task through the actions of its components. The associated design steps 
involve [121]: 

1. Decomposing a global task into distributable subcomponents, yield- 
ing tractable tasks for each agent; 

2. Establishing communication channels that provide sufficient informa- 
tion to each of the agents for it to achieve its task, but are not too 
unwieldly for the overall system to sustain; and 

3. Coordinating the agents in a way that ensures that the}' cooperate 
on the global task, or at the very least does not allow them to pursue 
conflicting strategies in trying to achieve their tasks. 

Step (3) is rarely trivial: one of the main difficulties encountered in MAS 
design is that agents act selfishly and artificial cooperation structures have 
to be imposed on their behavior to enforce cooperation [13]. An active area 
of research, which holds promise for addressing parts the design of col- 
lectives problem, is to determine how selfish agents’ “incentives” have to 
be engineered in order to avoid problems such as the tragedy of the com- 
mons (TOC) [209]. (This work draws on the economics literature, which 
we review' separately below'.) When simply providing the right incentives 
is not sufficient, one can resort to strategies that actively induce agents to 
cooperate rather than act selfishly. In such cases coordination [205], nego- 
tiations [135], coalition formation [193, 195, 249] or contracting [3] among 
agents may be needed to ensure that they do not work at cross purposes. 

Unfortunately, all of these approaches share with DAI and its offshoots 
the problem of reljdng on hand-tailoring, and therefore being difficult to 
scale and often nonrobust. In addition, except as noted in the next sub- 
section, they involve little to no adaptivity, and therefore the constituent 
computational elements axe usually not as robust as they w'ould need to be 
to provide the foundation for the field of collectives. 

Reinforcement Learning 

The maturing field of Reinforcement Learning (RL) provides a much needed 
tool for the types of problems addressed by collectives. The goal of an 
RL algorithm is to determine how', using those reward signals, the agent 
should update its action policy to maximize its utility [123, 220, 221, 232]. 
Because RL generally provides model-free 3 and “online” learning features, 
it is ideally suited for the distributed environment w'here a “teacher” is 
not available and the agents need to learn successful strategies based on 


3 There exist some model-based variants of traditional RL. See for example [8]. 
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“rewards” and “penalties” they receive from the overall system at various 
intervals. It is even possible for the learners to use those rewards to modify 
how they learn [199, 200]. 

Although work on RL dates back to Samuel’s checker player [191], rel- 
atively recent theoretical [232] and empirical results [56, 224] have made 
RL one of the most active areas in machine learning. Many problems rang- 
ing from controlling a robot’s gait to controlling a chemical plant to al- 
locating constrained resource have been addressed wdth considerable suc- 
cess using RL [97, 114, 166, 186, 247]. In particular, the RL algorithms 
TD{ A) (wdiich rates potential states based on a value function) [220] and 
(^-learning (which rates action-state pairs) [232] have been investigated 
extensively. A detailed investigation of RL is available in [123, 221, 232]. 

Intuitively, one might hope that RL -would help us solve the distributed 
control problem, since RL is adaptive, and, in general mode-free. How- 
ever, by itself, conventional single-agent RL does not provide a means for 
controlling large, distributed systems. The problem is that the space of pos- 
sible action policies for such systems is too big to be searched. So although 
powerful and widely applicable, solitary RL algorithms wall not generally 
perform well on large distributed heterogeneous problems. It is however 
natural to consider deploying many RL algorithms rather than a single one 
for these large distributed problems. 


Reinforcement Learning-Based Multi- Agent Systems 

Because it neither requires explicit modeling of the environment nor having 
a “teacher” that provides the “correct” actions, the approach of having the 
individual agents in a MAS use RL is well-suited for MAS’s deployed in 
domains w r here one has little knowdedge about the environment and/or 
other agents. There are two main approaches to designing such MAS’s: 

(i) One has ‘solipsistic agents’ that don’t know about each other and wdiose 
RL rewards are given by the performance of the entire system (so the joint 
actions of all other agents form an “inanimate background” contributing 
to the reward signal each agent receives); 

(ii) One has 'social agents’ that explicitly model each other and take each 
others’ actions into account. 

Both (i) and (ii) can be viewed as ways to (try to) coordinate the agents 
in a MAS in a robust fashion. 

Solipsistic Agents: MAS’s with solipsistic agents have been successful^ 
applied to a multitude of problems [56, 96, 107, 192, 198]. However, scaling 
to large systems is a major issue wdth solipsistic agents. The problem is 
that each agent must be able to discern the effect of its actions on the 
overall performance of the system, since that performance constitutes its 
rew-ard signal. As the number of agents increases though, the effects of any 
one agent’s actions (signal) wall be swamped by the effects of other agents 
(noise), making the agent unable to learn w^ell, if at all. In addition, of 
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course, solipsistic agents cannot be used in situations lacking centralized 
calculation and broadcast of the single global reward signal 
Social agents: MAS’s whose agents take the actions of other agents into 
account synthesize EL with game theoretic concepts ( e.g Nash equilib- 
rium). They do this to try to ensure that the overall system both moves 
toward achieving the overall global goal and avoids often deleterious oscil- 
latory behavior [53, 85, 111, 113, 112]. To that end, the agents incorporate 
internal mechanisms that actively model the behavior of other agents. In 
general this approach involves hand- tailoring for the problem, and there are 
some well-studied domains (El Farol Bax problem) in which such modeling 
is self-defeating [5, 238]. 

1.2.2 Game Theory 

Game theory is the branch of mathematics concerned with formalized ver- 
sions of “games”, in the sense of chess, poker, nuclear arms races, and the 
like [11, 19, 30, 73, 87, 148, 66, 207]. It is perhaps easiest to describe it by 
loosely defining some of its terminology, which we do here and in the next 
subsection. 

The simplest form of a game is that of 'non-cooperative single-stage 
extensive-form 7 game, which involves the following situation: There are 
two or more agents (called 'players 7 in the literature), each of which has a 
pre-specified set of possible actions that it can follow. (A 'finite 5 game has 
finite sets of possible actions for all the players.) In addition, each agent i 
has a utility function (also called a 'payoff matrix 7 for finite games). This 
maps any 'profile 5 of the action choices of all agents to an associated utility 
value for agent i. (In a 'zero-sum 7 game, for ever}* profile, the sum of the 
payoffs to all the agents is zero.) 

The agents choose their actions in a sequence, one after the other. The 
structure determining what each agent knows concerning the action choices 
of the preceding agents is known as the 'information set 7 . 4 Games in which 
each agent knows exactly what the preceding ('leader 7 ) agent did are known 
as 'Stackelberg games 7 . 

In a 'multi-stage 7 game, after all the agents choose their first action, 
each agent is provided some information concerning what the other agents 
did. The agent uses this information to choose its next action. In the usual 
formulation, each agent gets its payoff at the end of all of the game’s stages. 

An agent’s 'strategy 5 is the rule it elects to follow mapping the informa- 
tion it has at each stage of a game to its associated action. It is a 'pure 
strategy 7 if it is a deterministic rule. If instead the agent’s action is chosen 


4 While stochastic choices of actions is central to game theory, most of the work in 
the field assumes the information in information sets is in the form of definite facts, 
rather than a probability distribution. Accordingly, there has been relatively little work 
incorporating Shannon information theory into the analysis of information sets. 
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by randomly sampling from a distribution, that distribution is known a 
‘mixed strategy 5 . Note that an agent’s strategy concerns all possible se- 
quences of provided information, even any that cannot arise due to the 
strategies of the other agents. 

Any multi-stage extensive-form game can be converted into a ‘normal 
form’ game, which is a single-stage game in which each agent is ignorant 
of the actions of the other agents, so that all agents choose their actions 
“simultaneously”. This conversion is achieved by having the “actions” of 
each agent in the normal form game correspond to an entire strategy in the 
associated multi-stage extensive-form game. The payoffs to all the agents 
in the normal form game for a particular strategy profile is then given by 
the associated payoff matrices of the multi-stage extensive form-game. 

Nash Equilibrium 

A ‘solution 5 to a game, or an ‘equilibrium 5 , is a profile in which every agent 
behaves “rationally”. This means that every agent’s choice of strategy opti- 
mizes its utility subject to a pre-specified set of conditions. In conventional 
game theory those conditions involve, at a minimum, perfect knowledge of 
the payoff matrices of all other plaj^ers,, and often also involve specification 
of what strategies the other agents adopted and the like. In particular, 
a ‘Nash equilibrium 5 is a a profile where each agent has chosen the best 
strategy it can, given the choices of the other agents. A game may have no 
Nash equilibria, one equilibrium, or many equilibria in the space of pure 
strategies. A beautiful and seminal theorem due to Nash proves that every 
game has at least one Nash equilibrium in the space of mixed strategies 

[171]. 

There are several different reasons one might expect a game to result 
in a Nash equilibrium. One is that it is the point that perfectly ratio- 
nal Bayesian agents would adopt, assuming the probability distributions 
they used to calculate expected payoffs were consistent with one another 
[10, 124]. A related reason, arising even in a non-Bayesian setting, is that 
a Nash equilibrium provides “consistent” predictions, in that if all parties 
predict that the game will converge to a Nash equilibrium, no one will ben- 
efit by changing strategies. Having a consistent prediction does not ensure 
that all agents 5 payoffs are maximized though. The study of small pertur- 
bations around Nash equilibria from a stochastic dynamics perspective is 
just one example of a ‘refinement- 5 of Nash equilibrium, that is a criterion 
for selecting a single equilibrium state when more than one is present [154]. 

Cooperative Game Theory 

In cooperative game theory the agents are able to enter binding contracts 
with one another, and thereby coordinate their strategies. This allows the 
agents to avoid being “stuck” in Nash equilibria that are Pareto inefficient, 
that is being stuck at equilibrium profiles in which all agents would benefit 
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if only they could agree to all adopt different strategies, with no possibility 
of betrayal. The characteristic function of a game involves subsets ('coali- 
tions') of agents playing the game. For each such subset, it gives the sum 
of the payoffs of the agents in that subset that those agents can guarantee 
if they coordinate their strategies. An imputation is a division of such a 
guaranteed sum among the members of the coalition. It is often the case 
that for a subset of the agents in a coalition one imputation dominates 
another, meaning that under threat of leaving the coalition that subset 
of agents can demand the first imputation rather than the second. So the 
problem each agent, i is confronted with in a cooperative game is which set 
of other agents to form a coalition with, given the characteristic function 
of the game and the associated imputations i can demand of its partners. 
There are several different kinds of solution for cooperative games that have 
received detailed study, varying in how the agents address this problem of 
who to form a coalition with. Some of the more popular are the ‘core’, the 
'Shapley value’, the 'stable set solution’, and the 'nucleolus’. 

In the real w r orld, the actual underlying game the agents are playing 
does not only involve the actions considered in cooperative game theory’s 
analysis of coalitions and imputations. The strategies of that underljdng 
game also involve bargaining behavior, considerations of trying to cheat 
on a given contract, bluffing and threats, and the like. In many respects, 
by concentrating on solutions for coalition formation and their relation 
with the characteristic function, cooperative game theor}' abstracts away 
these details of the true underlying game. Conversely though, progress has 
recently been made in understanding how cooperative games can arise from 
non-cooperative games, as they must in the real world [11]. 


Evolution and Learning in Games 

Not surprisingly, game theory has come to play a large role in the field of 
multi-agent systems. In addition, due to Darwinian natural selection, one 
might expect game theory to be quite important in population biolog}', in 
which the “utility functions” of the individual agents can be taken to be 
their reproductive fitness. There is an entire subfield of game theory con- 
cerned with this connection with population biology, called 'evolutionary 
game theory’ [155, 157]. 

To introduce evolutionary game theory, consider a game in which all 
players share the same space of possible strategies, and there is an ad- 
ditional space of possible 'attribute vectors’ that characterize an agent, 
along with a probability distribution g across that new space. (Examples 
of attributes in the physical world could be things like size, speed, etc.) We 
select a set of agents to play a game by randomly sampling g. Those agents’ 
attribute vectors jointly determine the payoff matrices of each of the indi- 
vidual agents. (Intuitively, what benefit accrues to an agent for taking a 
particular action depends on its attributes and those of the other agents.) 
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However each agent i has limited information concerning both its attribute 
vector and that of the other players in the game, information encapsulated 
in an 'information structure’. The information structure specifies how much 
each agent knows concerning the game it is playing. 

In this context, we enlarge the meaning of the term “strategy” to not just 
be a mapping from information sets and the like to actions, but from entire 
information structures to actions. In addition to the distribution g over 
attribute vectors, we also have a distribution over strategies, h. A strategy 
s is a ‘population strategy’ if h is a delta function about 5 . Intuitively, we 
have a population strategy when each animal in a population “follows the 
same behavioral rules”, rules that take as input w^hat the animal is able to 
discern about its strengths and weakness relative to those other members 
of the population, and produce as output ho-w the animal wall act in the 
presence of such animals. 

Given g , a population strategy centered about s, and its own attribute 
vector, any player i in the support of g has an expected payoff for any 
strategy it might adopt. When V s payoff could not improve if it were to 
adopt any strategy other than 5 , we say that s is ‘evolutionary stable’. 
Intuitively, an evolutionary stable strategy^ is one that is stable with respect 
to the introduction of mutants into the population. 

Now consider a sequence of such evolutionary games. Interpret the pay- 
off that any agent receives after being involved in such a game as the 
‘reproductive fitness’ of that agent, in the biological sense. So the higher 
the payoff the agent receives, in comparison to the fitnesses of the other 
agents, the more “offspring” it has that get propagated to the next game. 
In the continuum-time limit, wdiere games are indexed by the real number 
t y this can be formalized by a differential equation. This equation specifies 
the derivative of g t evaluated for each agent i ’s attribute vector, as a mono- 
tonically increasing function of the relative difference between the payoff of 
i and the average payoff of all the agents. (We also have such an equation 
for h.) The resulting dynamics is known as ‘replicator dynamics’, wdth an 
evolutionary stable population strategy, if it exists, being one particular 
fixed point of the dynamics. 

Now consider removing the reproductive aspect of evolutionary game 
theory, and instead have each agent propagate to the next game, with 
“memory” of the events of the preceding game. Furthermore, allows each 
agent to modify its strategy from one game to the next by “learning” from 
its memory of past games, in a bounded rational manner. The field of 
learning in games is concerned with exactly such situations [86, 12, 17, 26, 
70, 126, 178, 173]. Most of the formal work in this field involves simple 
models for the learning process of the agents. For example, in ‘fictitious 
play 7 [86], in each successive game, each agent i adopts what would be 
its best strategy if its opponents chose their strategies according to the 
empirical frequency distribution of such strategies that i has encountered in 
the past. More sophisticated versions of this work employ simple Bayesian 
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learning algorithms, or re-inventions of some of the techniques of the RL 
community [190]. Typically in learning in games one defines a payoff to 
the agent for a sequence of games, for example as a discounted sum of the 
payoffs in each of the constituent games. Within this framework one can 
stud}^ the long term effects of strategies such as cooperation and see if they 
arise naturally and if so, under what circumstances. 

Many aspects of real world games that do not occur very naturally oth- 
erwise arise spontaneously in these kinds of games. For example, when the 
number of games to be played is not pre-fixed, it may behoove a particular 
agent i to treat its opponent better than it would otherwise, since i may 
have to rely on that other agent’s treating it woll in the future, if they end 
up playing each other again. This framework also allows us to investigate 
the dependence of evolving strategies on the amount of information avail- 
able to the agents [159]: the effect of communication on the evolution of 
cooperation [160, 162]; and the parallels betw^een auctions and economic 
theory [108, 161]. 

In many respects, learning in games is even more relevant to the stud}- 
of collectives than is traditional game theory. How-ever in general, it lacks a 
well defined world utility and is almost exclusively focused on the forward 
problem, making it a difficult starting point for a field of collectives. 


1.2.3 Other Social Science-Inspired Systems 

Some human economies provides examples of naturally occurring systems 
that can be view r ed as a (more or less) well-performing collectives. The field 
of economics provides much more though. Both empirical economics 
economic history, experimental economics) and theoretical economics ( e.g 
general equilibrium theory [4], theory of optimal taxation [164]) provide 
a rich literature on strategic situations wdiere many parties interact. In 
fact, much of economics can be mewed as concerning how- to maximize 
certain constrained kinds of w^orld utilities, W'hen there are certain (very 
strong) restrictions on the individual agents and their interactions, and in 
particular w-hen w*e have limited freedom in setting the utility functions of 
those agents. 

Mechanism Design 

One way to try to induce a large collective to reach an equilibrium point 
wnthout centralize control is via an auction. 5 (This is the approach usu- 


5 We do not discuss general equilibrium theory here in detail, because though it deals 
with the interaction among multiple markets to set the market “clearing 75 price for the 
goods, it is not appropriate for the study of collectives: it requires centralized control 
(Walrasian auctioner), does not allow for dynamic interactions and in general, there is 
no reason to believe that having the markets clear optimizes a world utility. 
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ally employed in computational markets — see below.) Along with optimal 
taxation and public good theory [ 137 ]. the design of auctions is the sub- 
ject of the field of mechanism design. Broadly defined, mechanism design 
is concerned with the incentives that must be applied to any set of agents 
that interact and exchange goods [ 87 , 164 , 229 ] in order to get those agents 
to exhibit desired behavior. Usually the desired behavior concerns pre- 
specified ‘inherent’ utility functions of some sort for each of the individual 
agents. In particular, mechanism design is often concerned with the incen- 
tives that must be superimposed on such inherent utility functions to guide 
the agents to a ‘(Pareto) efficient’ (or ‘Pareto optimal’) point, that is to a 
point in which no agent’s inherent utility can be improved without hurting 
another agent’s inherent utility [ 86 , 87 ]. 

One particularly important type of such an incentive scheme is an auc- 
tion. When many agents interact in a common environment often there 
needs to be a structure that supports the exchange of goods or information 
among those agents. Auctions provide one such (centralized) structure for 
managing exchanges of goods. For example, in the English auction all the 
agents come together and ‘bid’ for a good, and the price of the good is 
increased until only one bidder remains, who gets the good in exchange for 
the resource bid. As another example, in the Dutch auction the price of a 
good is decreased until one buyer is willing to pay the current price. 

All auctions perform the same task: match supply and demand. As such, 
auctions are one of the ways in which price equilibration among a set of 
interacting agents can be achieved. However very few world utilities have 
their maximum occur at a point that is Pareto optimal for the pre-set in- 
herent utility functions. Accordingly, unless we axe very fortunate in the 
relation between those inherent utility functions and (in general separately 
specified) world utility^ knowing how to induce such a Pare to optimal point 
is of little value. For example, in a transaction in an English auction both 
the seller and the buyer benefit. They may even have arrived at an allo- 
cation which is efficient. However, in that the- winner may well have been 
willing to pay more for the good, such an outcome may confound the goal 
of the market designer, if that designer’s goal is to maximize revenue. This 
point is returned to below, in the context of computational economics. 

Another, perhaps more intuitive perspective, is to view the restrictions 
of mechanism design as concerning the private utility functions of the in- 
dividual agents. Typically in mechanism design the private utility function 
for each agent ??, which maps states of the entire world (including the in- 
ternal state of the agent itself) to 7 £, is of the form 777 (£77,1 , £77,2 > n , 
T v(v-n , where 7^.) is agent 77’s pre-fixed inherent utility 
function, the £77,1, £77,2? ---,£7 ?>n constitute the first n of the n + k variables 
that that function depends on, and T^.) is the T?.* -valued “mechanism” 
function the designer can set, the 2, 2/77,™ being the variables mak- 

ing up its arguments. Unlike the private utility world utility can depend on 
all of the £77,1 5 •••■) £77,71 3 2/77,1 3 2/77,2 3 * • *3 2/77,771 directly (as well as depend on other 
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entirely different variables). As an example, the could be 

a. set of all agents’ bids at an auction, T v (.) could be ^-valued, giving 
the amount of change in rj's owned quantities of both money and the item 
up for bid, and the ...,z r/t n could parameterize rf s happiness trade-off 
relating owned quantities of the good and of mone}^. 

Typically 7 r? (.) and the choice of what variables make up the arguments 
y-q.\ ? y-q .2 5 — 9 yi 7 ,m to T v are fixed a priori , with only the function T v (.) al- 
lowed to vary in the design. In addition, often there are a priori restrictions 
on the functional form of the T v . For example, often the T rj are not allowed 
to vary with rj. More precisely, usually they must be invariant under the 
transformation rj rf in both the index to the function and the indices 
to its arguments. This means in particular that the designer can’t “cheat” 
and have the functional forms of the T n vary from one rj to another in a 
waj' that reflects the variations across the (often pre-determined) associ- 
ated vectors (x^i, For example, typically an auction mechanism 

determines who gets what goods for what price in a manner that is inde- 
pendent of the identities of the bidding agents, and in particular does not 
directly reflect any internal happiness trade-off parameters of the agents 
that aren’t reflected in their bids. 

From the perspective of a collective, these kinds of restrictions on private 
utilities only hold in a small subset of the potential computational prob- 
lems, and constitute a severe handicap in other scenarios. Another limita- 
tion of most of the work on mechanism design is that either it assumes a 
particular computational model for the agent, or (more common!}*) focuses 
on (game-theoretic) equilibria. This limited nature of the treatment of off- 
equilibrium scenarios is intimately related to the restrictions on the form of 
the private utility. If there are no restrictions on the private utilities, then 
there is a trivial solution for how to set such utilities to maximize the world 
utility at equilibrium: Have each such utility simply equal the world utility, 
in a so-called “team game” . To have the analysis be non-trivial, restrictions 
like those on the private utilities are needed. 

In practice though, no real system is at a game-theoretic equilibrium, 
due to bounded rationality. In particular, it means that if one considers 
mechanism design in the limiting case of no restrictions on 7 (.), the associ- 
ated “mechanism design solution” of a team game often will result in poor 
performance [238]. Team theory [105, 153] is one approach that has been 
tried to circumvent this problem. The idea there Is to remove all notions of 
a private or inherent utility, and solve directly for the strategy profile that 
will maximize the world utility. Needless to say though, such an approach 
becomes extraordinarily difficult for all but the simplest problems, and 
requires centralized, completely personalized control and communication, 
and exact modeling of the system’s dynamics. 
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Computational Economics 

‘Computational economies 7 are schemes inspired by economics, and more 
specifically by general equilibrium theory and mechanism design theory, 
for managing the components of a distributed computational system. They 
work by having a ‘computational market 7 , akin to an auction, guide the 
interactions among those components. Such a market is defined as any 
structure that allows the components of the system to exchange information 
on relative valuation of resources (as in an auction), establish equilibrium 
states (e.g., determine market clearing prices) and exchange resources (i.e., 
engage in trades). 

Such computational economies can be used to investigate real economies 
and biological systems [31, 34, 35, 128]. They can also be used to de- 
sign distributed computational systems. For example, such computational 
economies are well-suited to some distributed resource allocation prob- 
lems, where each component of the system can either directly produce the 
“goods 77 it needs or acquire them through trades with other components. 
Computational markets often allow for far more heterogeneity in the com- 
ponents than do conventional resource allocation schemes. Furthermore, 
there is both theoretical and empirical evidence suggesting that such mar- 
kets are often able to settle to equilibrium states. For example, auctions find 
prices that satisfy both the seller and the buyer which results in an increase 
in the utility of both (else one or the other would not have agreed to the 
sale). Assuming that all parties are free to pursue trading opportunities, 
such mechanisms move the system to a point w'here all possible bilateral 
trades that could improve the utility of both parties are exhausted. 

Now restrict attention to the case, implicit in much of computational 
market work, with the following characteristics: First, world utility can be 
expressed as a monotonically increasing function F where each argument 
i of F can in turn be interpreted as the value of a pre-specified utility 
function fy for agent i. Second, each of those f t is a function of an i- 
indexed ‘goods vector 7 Xi of the non-perishable goods “owned” by agent i. 
The components of that vector are Xij , and the overall system dynamics is 
restricted to conserve the vector Y2i x iJ * (Tb ere are also some other, more 
technical conditions.) As an example, the resource allocation problem can 
be viewed as concerning such vectors of “owned” goods. 

Due to the second of our two conditions, one can integrate a market- 
clearing mechanism into any system of this sort. Due to the first condition, 
since in a market equilibrium with non-perishable goods no (rational) agent 
ends up with a value of its utility function lower than the one it started with, 
the value of the world utility function must be higher at equilibrium than 
it was initially^. In fact, so long as the individual agents are smart enough 
to avoid all trades in which the}^ do not benefit, any computational market 
can only improve this kind of world utility, even if it does not achieve the 
market equilibrium. 


20 Kagan Turner and David Wolpert 


This line of reasoning provides one of the main reasons to use computa- 
tional markets in those situations in which they can be applied. Conversely, 
it underscores one of the major limitations of such markets: Starting with 
an arbitrary world utility function with arbitrary dynamical restrictions, it 
may be quite difficult to cast that function as a monotonically increasing 
F taking as arguments a set of agents’ goods- vector-based utilities fi, if 
we require that those fi be well-enough behaved that we can reasonably 
expect the agents to optimize them in a market setting. 

One example of a computational economy being used for resource al- 
location is Huberman and Clearwaters use of a double-blind auction to 
solve the complex task of controlling the temperature of a building. In this 
case, each agent (individual temperature controller) bids to buy or sell cool 
or warm air. This market mechanism leads to an equitable temperature 
distribution in the system [116]. Other domains where market mechanisms 
were successfully applied include purchasing memory in an operating s}^s- 
tems [50], allocating virtual circuits [75], “stealing” unused CPU cycles 
in a network of computers [69, 230], predicting option futures in financial 
markets [185], and numerous scheduling and distributed resource allocation 
problems [138, 142, 210, 218, 234, 235]. 

Computational economics can also be used for tasks not tightly coupled 
to resource allocation. For example, following the work of Maes [151] and 
Ferber [74], Baum shows how by using computational markets a large 
number of agents can interact and cooperate to solve a variant of the blocks 
w 7 orld problem [22, 23]. However, market-based computational economics 
relies on both centralized communication and centralized control to some 
degree, raising scalability issues. Furthermore, in practice, the applicability 
of computational economies depends greatly on the domain [225], making 
it a difficult starting point for a field of collectives. 


1.2.4 Biologically Inspired Systems 

Properly speaking, biological systems do not involve utility functions and 
searches across them wdth learning algorithms. However it has long been 
appreciated that there are many ways in which viewing biological systems 
as involving searches over such functions can lead to deeper understanding 
of them [203, 244]. Conversely, some have argued that the mechanism 
underlying biological systems can be used to help design search algorithms 
[109]. 6 

These kinds of reasoning w r hich relate utility functions and biological sys- 
tems have traditionally focussed on the case of a single biological system 
operating in some external environment. If we extend this kind of reason- 


6 See [150, 236] though for some counter- arguments to the particular claims most 
commonly made in this regard. 
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ing, to a set of biological systems that are co-evolving with one another, 
then we have essentially arrived at biologically-based collectives. This sec- 
tion discusses some of how previous work in the literature bears on this 
relationship between collectives and biology. 

Population Biology and Ecological Modeling 

The fields of population biology and ecological modeling are concerned with 
the large-scale “emergent” processes that govern the systems that consist 
of many (relatively) simple entities interacting with one another [24, 99]. 
As usually cast, the “simple entities” are members of one or more species, 
and the interactions are some mathematical abstraction of the process of 
natural selection as it occurs in biological systems (involving processes like 
genetic reproduction of various sorts, genotype- phenotype mappings, in- 
ter and intra-species competitions for resources, etc.). Population Biolog}' 
and ecological modeling in this context addresses questions concerning the 
djmamics of the resultant ecosystem, and in particular how its long-term 
behavior depends on the details of the interactions between the constituent 
entities. Broadly construed, the paradigm of ecological modeling can even 
be broadened to study how' natural selection and self-regulating feedback 
creates a stable planet- w-ide ecological environment — Gaia [144]. 

The underlying mathematical models of other fields can often be use- 
fully modified to apply to the kinds of systems population biology is in- 
terested in [14]. (See also the discussion in the game theory subsection 
above.) Conversely, the underlying mathematical models of population 
biology and ecological modeling can be applied to other non-biological 
systems. In particular, those models shed light on social issues such as 
the emergence of language or culture, warfare, and economic competition 
[71, 72, 88]. They also can be used to investigate more abstract issues 
concerning the behavior of large complex systems with many interacting 
components [89, 98, 156, 176, 184]. 

Going a bit further afield, an approach that is related in spirit to eco- 
logical modeling is 'computational ecologies’. These are large distributed 
systems w'here each component of the system’s acting (seemingly) indepen- 
dently results in complex global behavior. Those components are viewed as 
constituting an “ecology” in an abstract sense (although much of the math- 
ematics is not derived from the traditional field of ecological modeling). In 
particular, one can investigate how the dynamics of the ecology is influenced 
b}' the information available to each component and how' cooperation and 
communication among the components affects that dynamics [115, 117]. 

Although in some w-ays the most closely related to collectives of the cur- 
rent ecology-inspired research, the fields of population biology and compu- 
tational ecologies do not provide a full science of collectives. These fields 
are primarily concerned wdth the “forward problem” of determining the 
dynamics that arises from certain choices of the underlying system. Un- 



22 


Kagan Turner and David Wolpert 


less one’s desired dynamics is sufficiently close to some dynamics that was 
previously catalogued (during one’s investigation of the forward problem), 
one has very little information on how to set up the components and their 
interactions to achieve that desired dynamics. 

Swarm Intelligence 

The field of ‘swarm intelligence 5 is concerned with systems that are modeled 
after social insect colonies, so that the different components of the system 
are queen, worker, soldier, etc. It can be viewed as ecological modeling in 
which the individual entities have extremely limited computing capacity 
and/or action sets, and in which there are very few types of entities. The 
premise of the field is that the rich behavior of social insect colonies arises 
not from the sophistication of an}- individual entity in the colony, but from 
the interaction among those entities. The objective of current research is 
to uncover kinds of interactions among the entity types that lead to pre- 
specified behavior of some sort. 

More speculatively, the study of social insect colonies may also provide 
insight into how to achieve learning in large distributed systems. This is 
because at the level of the individual insect in a colony, very little (or no) 
learning takes place. However across evolutionary time-scales the social 
insect species as a whole functions as if the various individual types in a 
colony had “learned” their specific functions. The “learning” is the direct 
result of natural selection. (See the discussion on this topic in the subsection 
on ecological modeling.) 

Swarm intelligences have been used to adaptively allocate tasks [33, 136], 
solve the traveling salesman problem [62, 63] and route data efficiently in 
dynamic networks [32, 201, 219] among others. However, there is no general 
framework for adapting swarm intelligences to maximize particular world 
utility functions. Accordingly, such intelligences generally need to be hand- 
tailored for each application. 

1.2.5 Physics-Based Systems 

Statistical Physics 

Equilibrium statistical physics is concerned with the stable state character 
of large numbers of very simple physical objects, interacting according to 
well- specified local deterministic Taws, with probabilistic noise processes 
superimposed [6, 188]. Typically there is no sense in which such systems can 
be said to have centralized control, since all particles contribute comparably 
to the overall dynamics. 

Aside from mesoscopic statistical physics, the numbers of particles con- 
sidered are usual!}' huge (e.g.^ 10 23 ), and the particles themselves are ex- 
traordinarily simple, t} t pically having only a few degrees of freedom. More- 
over, the noise processes usually considered are highly restricted, being 
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those that are formed by “baths”, of heat, particles, and the like. Simi- 
larly, almost all of the field restricts itself to deterministic laws that are 
readily encapsulated in Hamilton’s equations (Schrodinger’s equation and 
its field-theoretic variants for quantum statistical physics). In fact, much of 
equilibrium statistical physics isn’t even concerned with the dynamic laws 
by themselves (as for example is stochastic Markov processes). Rather it is 
concerned with invariants of those laws ( e.g ., energy), invariants that relate 
the states of all of the particles. Deterministic laws without such readily- 
discoverable invariants are outside of the purview of much of statistical 
physics. 

One potential use of statistical physics for collectives involves taking the 
systems that statistical physics analyzes, especially those analyzed in its 
condensed matter variant (e.g., spin glasses [213, 214]), as simplified mod- 
els of a class of collectives. This approach is used in some of the analyses of 
the El Farol Bar problem, also called the minority game (see below) [5, 48]. 
It is used more overtly in (for example) the work of Galam [90], in tvhich 
the equilibrium coalitions of a set of “countries” are modeled in terms of 
spin glasses. This approach cannot provide a general collectives framework 
though. This is due to its not providing a general solution to arbitrary col- 
lectives inversion problems, being only concerned with the kinds of systems 
discussed above, and to its not employing RL algorithms. 7 

Another contribution that statistical physics can make is with the math- 
ematical techniques it has developed for its own purposes, like mean field 
theory, self-averaging approximations, phase transitions, Monte Carlo tech- 
niques, the replica trick, and tools to analyze the thermodynamic limit in 
w T hich the number of particles goes to infinity. Although such techniques 
have not yet been applied to collectives, the} 7 have been successfully ap- 
plied to related fields. This is exemplified by the use of the replica trick 
to anal} r ze two-player zero-sum games with random payoff matrices in the 
thermodynamic limit of the number of strategies in [27]. Other examples 
are the numeric investigation of iterated prisoner’s dilemma played on a 
lattice [223], the analysis of stochastic games by expressing of deviation 
from rationality in the form of a “heat bath” [154], and the use of topo- 
logical entropy to quantify the complexity of a voting system studied in 
ri -cl 

Other quite recent work in the statistical physics literature is formally 
identical to that in other fields, but presents it from a novel perspective. 


7 In regard to the latter point however, it’s interesting to speculate about recasting 
statistical physics as a collective, by viewing each of the particles in the physical system 
as running an “RL algorithm 5 ’ that perfectly optimizes the “utility function” of its 
Lagrangian, given the “actions” of the other particles. In this perspective, many-p article 
physical systems are multi-stage games that are at Nash equilibrium in each stage. So 
for example, a frustrated spin glass is such a system at a Nash equilibrium that is not 
Pareto optimal. 
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A good example of this is [211], which is concerned with the problem of 
controlling a spatially extended system with a single controller, by using an 
algorithm that is identical to a simple-minded proportional RL algorithm 
(in essence, a rediscovery of RL). 

Action Extremization 

Much of the theory of physics can be cast as solving for the extremization of 
an actional, which is a functional of the worldline of an entire (potentially 
man 3 r -component) S 3 -stem across all time. The solution to that extremiza- 
tion problem constitutes the actual worldline followed b 3 r the system. In 
this way the calculus of variations can be used to solve for the worldline 
of a d 3 mamic S 3 r stem. As an example, simple Newtonian d 3 mamics can be 
cast as solving for the worldline of the S 3 7 stem that extremizes a quanthy 
called the ‘Lagrangian 5 , which is a function of that worldline and of certain 
parameters (e.p., the ‘potential energy 5 ) governing the system at hand. In 
this instance, the calculus of variations simpR results in Newtons laws. 

If we take the d 3 r namic s}'stem to be a collective, we are assured that its 
worldline automatically optimizes a “global goal” consisting of the value of 
the associated actional. If we change ph\ T sical aspects of the system that 
determine the functional form of the actional (e.p., change the S3 r stem 5 s 
potential energy function), then we change the global goal, and we axe 
assured that our collective optimizes that new global goal. Counter-intuitive 
ph 3 T sical systems, like the strings-and-springs systems that exhibit Braess 5 
paradox [20], are simply S3 r stems for which the “world utilny” implicit in 
our human intuition is extremized at a point different from the one that 
extremizes the S 3 7 stem’s actional. 

The challenge in exploiting this to solve the design of collectives problem 
is in translating an arbitrary provided global goal for the collective into a 
parameterized actional. Note that that actional must govern the d}*nam- 
ics of the collective, and the parameters of the actional must be physical 
variables in the collective, variables whose values w T e can modify. 

Active Walker Models 

The field of active walker models [21, 100, 101] is concerned with model- 
ing “walkers” (be the 3 r human walkers or instead simple pltysical objects) 
crossing fields along trajectories, where those trajectories are a function 
of several factors, including in particular the trails already worn into the 
field. Often the kind of trajectories considered are those that can be cast 
as solutions to actional extremization problems so that the walkers can be 
explicitly viewed as agents optimizing a private utility. 

One of the prima^ concerns with the field of active walker models is how 
the trails worn in the field change with time to reach a final equilibrium 
state. The problem of how to design the cement pathwa 3 r s in the field 
(and other pl^sical features of the field) so that the final paths actually 
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followed by the walkers will have certain desirable characteristics is then 
one of solving for parameters of the actional that will result in the desired 
worldline. This is a special instance of the inverse problem of how to design 
a collective. 

Using active walker models this way to design collectives, like action 
extremization in general, probably has limited applicability. Also, it is not 
clear how* robust such a design approach might be, or whether it w^ould be 
scalable and exempt from the need for hand-tailoring. 

1.2.6 Other Related Subjects 

This subsection presents a “catch-all” of other fields that have little in 
common wdth one another and w*hile either still nascent or not extremely 
closely related to collectives, bear some relation to collectives. 

Stochastic Fields 

An extremely well-researched body of work concerns the mathematical and 
numeric behavior of s} r stems for which the probability distribution over 
possible future states conditioned on preceding states is explicitly pro- 
vided. This work involves many aspects of Monte Carlo numerical algo- 
rithms [172], all of Markov Chains [80, 177, 215], and especially Markov 
fields, a topic that encompasses the Chapman-Kolmogorov equations [91] 
and its variants: Liouville’s equation, the Fokker-Plank equation, and the 
Detailed-balance equation in particular. Non-linear dynamics is also related 
to this body of work (see the synopsis of iterated function sj'Stems below 
and the s}mopsis of cellular automata above), as is Markov competitive 
decision processes (see the synopsis of game theory above) . 

Formally, one can cast the problem of designing a collective as how to fix 
each of the conditional transition probability distributions of the individual 
elements of a stochastic field so that the aggregate behavior of the overall 
system is of a desired form. 8 

Amorphous computing and Control of Smart Matter 

Amorphous computing grew' out of the idea of replacing traditional com- 
puter design, wdth its requirements for high reliability' of the components of 


8 In contrast, in the field of Markov decision processes, discussed in [45], the full sys- 
tem may be a Markov field, but the system designer only sets the conditional transition 
probability distribution of a few of the field elements at most, to the appropriate “deci- 
sion rules”. Unfortunately, it is hard to imagine how to use the results of this field to de- 
sign collectives because of major scaling problems. Any decision process must accurately 
model likely future modifications to its own behavior — often an extremely daunting 
task [150]. What’s worse, if multiple such decision processes are running concurrently in 
the system, each such process must also model the others, potentially needing to model 
them in their full complexity. 
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the computer, with a novel approach in which widespread unreliability of 
those components would not interfere with the computation [2, 1]. Some of 
its more speculative aspects are concerned with “how to program” a mas- 
sively distributed, noisy system of components which may consist in part 
of biochemical and/or biomechanical components [131, 233]. Work here has 
tended to focus on schemes for how to robustly induce desired geometric 
dynamics across the physical bod}' of the amorphous computer — issue 
that are closely related to morphogenesis, and thereby lend credence to the 
idea that biochemical components are a promising approach. 

Especially in its limit of computers with very small constituent compo- 
nents, amorphous computing also is closely related to the fields of nanotech- 
nology [64]. As the prospect of nanotechnology-driven mechanical systems 
gets more concrete, the daunting problem of how to robustly control, power, 
and sustain protean systems made up of extremely large sets of nano-scale 
devices looms more important [95, 96, 107]. If this problem were to be 
solved one would in essence have “smart matter” . For example, one would 
be able to “paint” an airplane wing with such matter and have it improve 
drag and lift properties significantly: 


Self Organizing Systems 

The concept of self-organization and self-organized criticality [15] was origi- 
nally developed to help understand why many distributed physical systems 
are attracted to critical states that possess long-range dynamic correla- 
tions in the large-scale characteristics of the system. It provides a powerful 
framework for analyzing both biological and economic systems. For exam- 
ple, natural selection (particularly punctuated equilibrium [68, 93]) can 
be likened to self-organizing dynamical system, and some have argued it 
shares many the properties ( e.g ., scale invariance) of such systems [57]. 
Similarly, one can view the economic order that results from the actions of 
human agents as a case of self-organization [59]. The relationship between 
complexity and self-organization is a particularly important one, in that it 
provides the potential laws that allow order to arise from chaos [125]. 


Adaptive Control Theory 

Adaptive control [7, 196], and in particular adaptive control involving 
locally weighted RL algorithms [9, 165], constitute a broadly applicable 
framework for controlling small, potentially inexactly modeled systems. 
Augmented by techniques in the control of chaotic systems [52, 60, 61], 
they constitute a very successful way of solving the “inverse problem” for 
such systems. Unfortunately, it is not clear how one could even attempt to 
scale such techniques up to the massively distributed systems of interest in 
collectives. 
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1.3 COIN Framework 

The previous section provided a summary of different fields that address 
various issues pertinent to the field of collectives. In this section, we sum- 
marize the COIN (Collective Intelligence) framework, which is one of the 
first frameworks that aims to bridge the gap between the needs of the field 
of collectives and the extant analysis/design methods. 9 


1.3.1 Central Equation 

Let Z be an arbitrary vector space whose elements z give the joint move of 
all agents in the system (i.e., z specifies the full “worldline” consisting of 
the actions/states of all the agents). The world utility G(z), is a function 
of the full worldline, and we are concerned with the problem of finding the 
z that maximizes G(z). 

In addition to G, for each agent 77, there is a private utility function 
{gr}}- The agents act to improve their individual private utility functions, 
even though, we, as system designers are onfy concerned with the value 
of the world utility G. Tc specify all agents other than 77, we will use the 
notation 77. 

Our uncertainty concerning the behavior of the system is reflected in a 
probability distribution over Z. Our ability to control the system consists 
of setting the value of some characteristic of the agents, e.g., setting the 
private functions of the agents. Indicating that value by s, our analysis 
revolves around the following central equation for P(G [ s), which follows 
from Bayes’ theorem: 

P(G | s) = J de G P(G | e G ,s) J de g P(e G | ? g ,s)P(i g | s) , (1.1) 

where t g is the vector of the “intelligences” of the agents with respect to 
their associated private functions, and cq is the vector of the intelligences 
of the agents with respect to G. Intuitively, these vectors indicate what per- 
centage of 77’s actions would have resulted in lower utility. 10 In this chapter, 
we use intelligence vectors as decomposition variables for Equation 1 . 1 . 

Note that e 3v (z) = 1 means that player 77 is fully rational at z, in that its 
move maximizes the value of its utility^ given the moves of the plaj^ers. In 
other words, a point z where e 9r} (z) = I for all players rj is one that meets 
the definition of a game-theory Nash equilibrium. On the other hand, a 
z at which all components of c# = 1 is a local maximum of G (or more 
precisely, a critical point of the G(z) surface). So if we can get these two 
vectors to be identical, then if the agents do well enough at maximizing 


9 The full COIN theory is presented in Chapter 2. 

10 Intelligence is formally denned in Chapter 2. 
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their private utilities we are assured we will be near a local maximum of 
G. 

To formalize this, consider our decomposition of P(G j s). If we can 
choose s so that the third conditional probabiiitj^ in the integrand, P(c g \ 
s), is peaked around vectors e 9 all of whose components are close to 1 
(that is agents are able to “learn” their tasks), then we have likely induced 
large private utility intelligences. If we can also have the second term, 
P(cg | e g . s), be peaked about cq equal to e g (that is the private and world 
utilities are aligned), then cq will also be large. Finally 7 , if the first term in 
the integrand, P(G | e<?, 5), is peaked about high G when cq is large, then 
our choice of 5 will likely result in high G , as desired. 


1.3.2 Factoredness and Learnability 

For high values of G to be achieved, the private utility functions need 
to have two properties. 11 First, the private utility functions need to be 
“aligned with <7”, a need that is expressed in the second term of Equa- 
tion 1.1. In particular, regardless of the details of the stochastic environ- 
ment in which the agents operate, or of the details of the learning algo- 
rithms of the agents, if e 9 equals €q exactly for all z, the desired form for 
the second term in Equation 1.1 is assured. We call such a system factored. 
In game theory language, the private utility function Nash equilibria of a 
factored system are local maxima of G. In addition to this desirable equi- 
librium behavior, factored systems also automatically provide appropriate 
off-equilibrium incentives to the agents (an issue generally not considered 
in the game theory / mechanism design literature) . 

Second, we want the agents’ private utility functions to have high learn- 
ability, intuitively meaning that an agent’s utility should be sensitive to 
its own actions and insensitive to actions of others. This requirement that 
private utility functions have high “signal-to-noise” arises in the third term. 
As an example, consider a “team game” where the private utility functions 
are set to G. [56] Such a system is tautologically factored- However team 
games often have low learnability, because in a large system an agent will 
have a difficult- time discerning the effects of its actions on G . As a con- 
sequence, each rj may have difficulty achieving high g v in such a system. 
Loosely speaking, agent 77 ’s learnability is the ratio of the sensitivity of g v 
to 77 ’s actions to the sensitivity g v to the actions of all other agents. So 


11 Non-game theory-based function maximization techniques like simulated annealing 
instead address how to have term 1 have the desired form. They do this by trying to 
ensure that the local maxima that the underlying system ultimately settles near have 
high (?, by “trading off exploration and exploitation 7 ’. One can combine such term-1- 
based techniques with the techniques presented here, The resultant hybrid algorithm, 
addressing all three terms, outperforms simulated annealing by over two orders of mag- 
nitude [240]. 
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at a given state z, the higher the learnability, the more g v (z) depends on 
the move of agent rj , i.e., the better the associated signal-to-noise ratio for 
r ]. Intuitively then, higher learnability means it is easier for r\ to achieve a 
large value of its utility. 

1.3.3 Difference Utilities 

It is possible to solve for the set of all private utilities that are factored 
with respect to a particular world utility. Unfortunately, in general it is not 
possible for a collective both to be factored and to have perfect learnability 
for all of its players (i.e., no dependence of any g v on any agent other than 
rj) for all of its agents [238]. However, consider difference utilities, which 
are of the form: 


DU(z) = G(z) - r(/(*)) , (1.2) 

where T(f) is independent of z r) . Such difference utilities are factored [238]. 
In addition, under usually benign approximations, learnability is maxi- 
mized over the set of difference utilities by choosing 

r(/(z)) = E(G[z- 1? , S ), (1.3) 

up to an overall additive constant. We call the resultant difference utility 
the Aristocrat utility (AU). If each player rj uses an appropriately rescaled 
version of the associated AU as its private utility function, then we have 
ensured good form for both terms 2 and 3 in Equation 1.1. 

Using AU in practice is sometimes difficult, due to the need to evaluate 
the expectation value. Fortunately there are other utility functions that, 
while being easier to evaluate than AU, still are both factored and possess 
superior learnability to the team game utility, g v = G. One such private 
utility function is the Wonderful Life Utility (WLU). The WLU for player 
t 7 is parameterized by a pre-fixed clamping parameter CL V chosen from 
among rf s possible moves: 

WLU V = G{z) - G(z- V , CLr,) . (1.4) 

WLU is factored no matter what the choice of clamping parameter. Fur- 
thermore, while not matching the high learnability of AU, WLU usually has 
far better learnability than does a team game, because most of the “noise” 
due to other agents is removed from ffs utility. Therefore, WLU generally 
results in better performance than does team game utilities [228, 238]. 

Figure 1.1 provides an example of clamping. As in that example, in many 
circumstances there is a particular choice of clamping parameter for agent r\ 
that is a “null” move for that agent, equivalent to removing that agent from 
the system. For such a clamping parameter WLU is closely related to the 
economics technique of “endogenizing a peer’s (agent’s) externalities”, for 
example with the Groves mechanism [174, 175, 87]. 
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FIGURE 1.1. This example shows the impact of the clamping operation on the 
joint state of a four-agent system where each agent has three possible actions, 
and each such action is represented by a three-dimensional unary vector. The 
first matrix represents the joint state of the system z where agent 1 has selected 
action 1. agent 2 has selected action 3, agent 3 has selected action 1 and agent 
4 has selected action 2. The second matrix displays the effect of clamping agent 
2 5 s action to the “null” vector (i.e., replacing z V2 with 0). The third matrix 
shows the effect of instead clamping agent 2’s move to the “average” action 
vector a = {.33, .33, .33}. which amounts to replacing that agent’s move with the 
“illegal” move of fractionally taking each possible move (z n2 — a). 


However it is usualfy the case that using WLU with a clamping parameter 
that is as close as possible to the expected move defining AU results in fax 
higher leamability than does clamping to the null move. Such a WLU 
is roughl} r akin to a mean-field approximation to AU . 12 For example, in 
Fig. 1 . 1 , if the probabilities of player 2 making each of its possible moves 
was 1/3, then one would expect that a clamping parameter of a would be 
close to optimal. Accordingly, in practice use of such an alternative WLU 
derived as a “mean-field approximation” to AU almost always results in 
far better values of G than does the “endogenizing” WLU. 

Intuitively, collectives having factored and highly learnable private utili- 
ties like AU and WLU can be viewed as akin to well-run human companies. 
G is the “bottom line” of the company, the players 77 are identified with the 
employees of that company, and the associated g v given by the employees’ 
performance-based compensation packages. For example, for a “factored 
company”, each employee’s compensation package contains incentives de- 


12 Formally, our approximation is exact only if the expected value of G equals G eval- 
uated at the expected joint move (both expectations being conditioned on given moves 
by all players other than 7?). In general though, for relatively smooth G, we would expect 
such a mean-field approximation to AU, to give good results, even if the approximation 
does not hold exactly. 
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signed such that the better the bottom line of the corporation, the greater 
the employee’s compensation. As an example, the CEO of a company wish- 
ing to have the private utilities of the employees be factored with G may 
give stock options to the employees. The net effect of this action is to ensure 
that what is good for the employee is also good for the company. In ad- 
dition, if the compensation packages are “highly learnable”, the employees 
will have a relatively easy time discerning the relationship between their 
behavior and their compensation. In such a case the employees wall both 
have the incentive to help the company and be able to determine how best 
to do so. Note that in practice, providing stock options is usually more 
effective in small companies than in large ones. This makes perfect sense 
in terms of the formalism summarized above, since such options generally 
have higher learnability in small companies than they do in large compa- 
nies, in which each employee has a hard time seeing how his/her moves 
affect the company’s stock price. 

1.3.4 Summary of COIN Results to Date 

In earlier w'ork, we tested the WLU for distributed control of network 
packet routing [241], achieving substantially better throughput than by us- 
ing the best possible shortest-path-based system [241], even though that 
SPA-based system has information denied the agents in the TTT[/-based 
collective. In related wx>rk we have shown that use of the WLU automati- 
cally avoids the infamous Braess’ paradox, in w’hich adding new links can 
actually decrease throughput — a situation that readily ensnares SPA’s [228, 
239]. 

We have also applied the WLU to the problem of controlling commu- 
nication across a constellation of satellites so as minimize the importance- 
weighted loss of scientific data flowung across that constellation [237]. We 
have also showm that agents using utility functions derived from the COIN 
framework significantly improve performance in the problem of job schedul- 
ing across a heterogeneous computing grid [227]. 

In addition w?e have explored COIN-based techniques on variants of con- 
gestion games [238, 242, 243], in particular of a more challenging variant 
of Arthur ’s El Parol bar attendance problem [5] (also known as the “mi- 
nority game” [48]). In this v/ork w^e showed that use of the W LU can result 
in performance orders of magnitude superior to that of team game utili- 
ties. We have also successfully applied COIN techniques to the problem of 
coordinating a set of autonomous rovers so as to maximize the importance- 
weighted value of a set of locations they visit [226] . 

Finally we have also explored applying COIN techniques to problems that 
are explicitly cast as search. These include setting the states of the spins in 
a spin glass to minimize energy; the conventional bin-packing problem of 
computer science, and a model of human agents connected in a small-w r orld 
network who have to sjmchronize their purchase decisions [240] . 
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1.4 Applications/Problems Driving Collectives 

The previous sections focused on fields that provide solutions to problems 
arising in the field of collectives. To complement them, in this section we 
present three problems that are particularly suited to being approached 
from the field of collectives, and that provide fertile ground for testing 
novel theories of collectives. 


1.4-1 El Farol Bar Problem ( Minority Game) 

The “El Farol” bar problem (also known as the minority game) and its 
variants provide a dean and simple testbed for investigating certain kinds 
of interactions among agents [5, 44, 47, 206]. In the original version of the 
problem, which arose in economics, at each time step (each “night”), each 
agent needs to decide whether to attend a particular bar. The goal of the 
agent in making this decision depends on the total attendance at the bar on 
that night. If the total attendance is below a preset capacity then the agent 
should have attended. Conversely, if the bar is overcrowded on the given 
night, then the agent should not attend. (Because of this structure, the 
bar problem with capacity set to 50% of the total number of agents is also 
known as the ‘minority game 5 ; each agent selects one of two groups at each 
time step, and those that are in the minority have made the right choice). 
The agents make their choices by predicting ahead of time whether the 
attendance on the current night will exceed the capacity and then taking 
the appropriate course of action. 

What makes this problem particularly interesting is that it is impossible 
for each agent to be perfectly “rational”, in the sense of correctly pre- 
dicting the attendance on any given night. This is because if most agents 
predict that the attendance will be low (and therefore decide to attend), 
the attendance will actually high, while if the}' predict the attendance will 
be high (and therefore decide not to attend) the attendance will be low. 
(In the language of game theory, this essentially amounts to the property 
that there are no pure strategy Nash equilibria [49, 246].) Alternatively, 
viewing the overall system as a collective, it has a Prisoner’s Dilemma-like 
nature, in that “rational” behavior by all the individual agents thwarts the 
global goal of maximizing total enjojment (defined as the sum of all agents’ 
enjoyment and maximized when the bar is exactly at capacity). 

This frustration effect is a crisp example of the difficulty that can arise 
when agents try to model agents that are in their turn modeling the first 
agents. It is similar to what occurs in spin glasses in physics, and makes 
the bar problem closely related to the ph}'sics of emergent behavior in 
distributed systems [46, 47, 48, 248]. Researchers have also studied the dy- 
namics of the bar problem to investigate economic properties like competi- 
tion, cooperation and collective behavior and especially their relationship 
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to market efficiency [58, 122, 197]. 


1.4-2 Data Routing in a Network 

Packet routing- in a data network [28, 110, 212, 231, 127, 94] presents a 
particularly interesting domain for the investigation of collectives. In par- 
ticular, with such routing: 

(i) the problem is inherently distributed; 

(ii) for all but the most trivial networks it is impossible to employ global 
control ; 

(iii) the routers have only access to local information (routing tables); 

(iv) it constitutes a relatively clean and easily modified experimental testbed; 
and 

(v) there are potentially major bottlenecks induced by ‘greedy’ behavior 
on the part of the individual routers, which behavior constitutes a readily 
investigated instance of the Tragedy Of the Commons (TOC). 

Many of the approaches to packet routing incorporate a variant on RL [39, 
43, 51, 147, 152]. Q-routing is perhaps the best known such approach and 
is based on routers using reinforcement learning to select the best path [39]. 
Although generally successful, Q-routing is not a general scheme for invert- 
ing a global task. This is even true if one restricts attention to the problem 
of routing in data networks — there exists a global task in such problems, 
but that task is directly used to construct the algorithm. 

A particular version of the general packet routing problem that is ac- 
quiring increased attention is the Quality of Service (QoS) problem, where 
different communication packets (voice, video, data) share the same band- 
width resource but have widely varying importances both to the user and 
(via revenue) to the bandwidth provider. Determining wffiich packet has 
precedence over wdiich other packets in such cases is not only based on 
priority in arrival time but more generally on the potential effects on the 
income of the bandwidth provider. In this context, RL algorithms have 
been used to determine routing policy, control call admission and maxi- 
mize revenue by allocating the available bandwidth efficiently [43, 152]. 

Many researchers have exploited the noncooperative game theoretic un- 
derstanding of the TOC in order to explain the bottleneck character of 
empirical data networks’ behavior and suggest potential alternatives to 
current routing schemes [25, 67, 132, 133, 139, 141, 179, 180, 208]. Closely 
related is w^ork on various “pricing” -based resource allocation strategies 
in congestable data networks [149]. This w^ork is at least partially based 
upon current understanding of pricing in toll lanes, and traffic flow in gen- 
eral (see below 7 ). All of these approaches are particularly of interest when 
combined with the RL-based schemes mentioned just above. Due to these 
factors, much of the current research on a general framework for collectives 
is directed toward the packet-routing domain (see next section). 
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1.4-3 Traffic Theory 

Traffic congestion typifies the Tragedy of the Commons public good prob- 
lem: everyone wants to use the same resource, and all parties greedily try- 
ing to optimize their use of that resource not only worsens global behavior, 
but also worsens their own private utility (e.g.. if everyone disobeys traffic 
lights, everyone gets stuck in traffic jams). Indeed, in the well-known Braess’ 
paradox [20, 54, 55, 134], keeping everything else constant — including the 
number and destinations of the drivers — but opening a new traffic path 
can increase everyone's time to get to their destination. (Viewing the over- 
all system as an instance of the Prisoner’s dilemma, this paradox in essence 
arises through the creation of a novel 'defect-defect’ option for the overall 
system.) Greedy behavior on the part of individuals also results in very rich 
global dynamic patterns, such as stop and go waves and clusters [102, 103]. 

Much of traffic theory employs and investigates tools that have previously 
been applied in statistical physics [102, 129, 130, 183, 187] (see subsection 
above). In particular, the spontaneous formation of traffic jams provides 
a rich testbed for studying the emergence of complex activity from seem- 
ingly chaotic states [102, 104]. Furthermore, the dynamics of traffic flow is 
particular amenable to the application and testing of many novel numeri- 
cal methods in a controlled environment [16, 29, 202]. Many experimental 
studies have confirmed the usefulness of applying insights gleaned from 
such work to real world traffic scenarios [102, 170, 169]. 


1.5 Challenge Ahead 

Unfortunately, though they provide valuable insight on some aspects of col- 
lectives, none of the fields discussed above can be modified to encompass 
systems meeting all of the requirements of a “field” of collectives. This is 
not too surprising, since none of those fields were explicitly designed to 
design/analyze collectives, but rather touched on certain aspects of collec- 
tives. 

What is needed is a fundamentally new look at this field, one that though 
may borrow from the various fields, will not simply extend an existing 
field that was not meant to analyze general collectives. There are many 
directions in which future work on collectives can and will proceed. It is 
a- vast and rich area of research, and understanding the interaction among 
the various fields is essential in forging new? directions. 
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